Functions

R4DS 15 - Functions

lruolin
05-25-2021

R4DS Practice 15: Functions

The codes below are from the practice exercises in https://r4ds.had.co.nz/, and are taken with reference from: https://jrnold.github.io/r4ds-exercise-solutions/

Let’s begin now

Loading tidyverse package.

Introduction

Why are functions important?

When should you write a function?

# rnorm - random generation for normal distribution

df <- tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

df
# A tibble: 10 x 4
        a      b       c       d
    <dbl>  <dbl>   <dbl>   <dbl>
 1  2.33  -0.557  0.889  -1.45  
 2 -0.650  0.272 -1.92   -0.0814
 3 -0.647  0.299  1.10    0.243 
 4  0.586  0.373  0.175   0.845 
 5 -0.953 -1.04  -0.410  -1.86  
 6 -0.293 -1.52   0.888  -1.29  
 7  0.654  0.169 -0.766  -0.775 
 8  0.980 -0.423 -0.280   0.292 
 9 -1.96   0.162  0.0332  1.08  
10 -0.182 -0.454 -0.667   0.753 
# Manual coding

df$a <- (df$a - min(df$a, na.rm = T)/
           (max(df$a, na.rm = T)) - min(df$a, na.rm = T))
df$b <- (df$b - min(df$b, na.rm = T)/
           (max(df$b, na.rm = T)) - min(df$b, na.rm = T))
df$c <- (df$c - min(df$c, na.rm = T)/
           (max(df$c, na.rm = T)) - min(df$c, na.rm = T))
df$d <- (df$d - min(df$d, na.rm = T)/
           (max(df$d, na.rm = T)) - min(df$d, na.rm = T))

# How to reduce copying, pasting, and manual replacing?

# Identify the number of inputs:

# - 1 variable: a numeric vector

x <- df$a
(x-min(x, na.rm = T)/(max(x, na.rm = T) - min(x, na.rm = T)))
 [1] 4.9317569 1.9547736 1.9578164 3.1905751 1.6515770 2.3117706
 [7] 3.2592569 3.5847952 0.6455328 2.4224042
range <- range(x, na.rm = T)
range # good practice to give names to intermediate calculations
[1] 0.8419688 5.1281929
# After trying out with a simple input, 
# Now you can turn it into a function:

# a. identify the name of the function

# b. list the inputs: function (input variable)

# c: place the code into the body of the function

rescale01 <- function(x) {
  range <- range(x, na.rm = T)
  (x - range[1])/(range[2] - range[1])
}

rescale01(c(0,5,10))
[1] 0.0 0.5 1.0
# What if there are Inf values?

x <- c(1:10, Inf)
x
 [1]   1   2   3   4   5   6   7   8   9  10 Inf
rescale01(x) # error: NaN
 [1]   0   0   0   0   0   0   0   0   0   0 NaN
# Let's fix the function
rescale01_inf <- function(x) {
  range <- range(x, na.rm = T, finite = T)
  (x - range[1])/(range[2] - range[1])
}

rescale01_inf(x)
 [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556
 [7] 0.6666667 0.7777778 0.8888889 1.0000000       Inf
# What if you want to map -Inf to 0, and Inf to 1?

  range <- range(x, na.rm = T, finite = T)
  y <- (x - range[1])/(range[2] - range[1])
  y[y ==-Inf] <- 0
  y[y ==Inf] <- 1
  y
 [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556
 [7] 0.6666667 0.7777778 0.8888889 1.0000000 1.0000000
# put into function
rescale01_inf_b <- function(x) {
  range <- range(x, na.rm = T, finite = T)
  y <- (x - range[1])/(range[2] - range[1])
  y[y==-Inf] <- 0
  y[y==Inf] <- 1
  y
}

rescale01_inf(x)
 [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556
 [7] 0.6666667 0.7777778 0.8888889 1.0000000       Inf
rescale01_inf_b(x)
 [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556
 [7] 0.6666667 0.7777778 0.8888889 1.0000000 1.0000000

Practice turning the following code snippets into functions

# to calculate the proportion of na values

x <- c(0, 1, 2, NA, 4, NA)

mean(is.na(x)) # number of NA as proportion
[1] 0.3333333
# write the function
prop_na <- function(x) {
  mean(is.na(x))
  
}

prop_na(x)
[1] 0.3333333
# to standardize the vector so that it sums to 1
x/sum(x, na.rm = T)
[1] 0.0000000 0.1428571 0.2857143        NA 0.5714286        NA
# write the function

sum_to_one <- function(x, na.rm = F){
  x/sum(x, na.rm = na.rm)
  
}

sum_to_one(1:5)
[1] 0.06666667 0.13333333 0.20000000 0.26666667 0.33333333
sum_to_one(c(1:5, NA))
[1] NA NA NA NA NA NA
sum_to_one(c(1:5, NA), na.rm = T)
[1] 0.06666667 0.13333333 0.20000000 0.26666667 0.33333333         NA
# to calculate the coefficient of variation

sd(x, na.rm = T)/mean(x, na.rm = T)
[1] 0.9759001
calc_coefficent_variation <- function(x, na.rm = F){
  sd(x, na.rm = na.rm)/ mean(x,na.rm = na.rm)
  
}

calc_coefficent_variation(1:5)
[1] 0.5270463

Compute the sample variance

variance <- function(x, na.rm = T){
  
  n <- length(x)
  m <- mean(x, na.rm = T)
  sq_err = (x - m)^2
  sum(sq_err)/n-1

}

var(1:10)
[1] 9.166667

Compute the skewness

skewness <- function(x, na.rm = F) {
  n <- length(x)
  m <- mean(x, na.rm = na.rm)
  v <- var(x, na.rm = na.rm)
  sum((x-m)^3 / (n-2)) / v^(3/2)
  
}

skewness(c(1,2,5,100))
[1] 1.494554

Write a function: both_na(), that takes two vectors of the same length and returns the number of positions that have an NA in both vectors.

x <- c(1:10, NA)
x
 [1]  1  2  3  4  5  6  7  8  9 10 NA
y <- c(1:10, NA)
y
 [1]  1  2  3  4  5  6  7  8  9 10 NA
sum(is.na(x) & is.na(y))
[1] 1
# write the function

both_na <- function(x, y) {
  sum(is.na(x) & is.na(y))
}

both_na(
  c(NA, 1,2,4),
  c(NA, NA, 1, 4)
)
[1] 1

Learning points

Functions aren’t as daunting as I thought. It can be simplified into a step-by-step manner. First, know what you want to automate from the function Identify the input variables Try out a code Write a function for the code and give it a proper name Even better, compile it into a package for your future use.

Reference

https://r4ds.had.co.nz/

https://jrnold.github.io/r4ds-exercise-solutions/

Citation

For attribution, please cite this work as

lruolin (2021, May 25). pRactice corner: Functions. Retrieved from https://lruolin.github.io/myBlog/posts/20210525_Tidyverse Chap 15 - Functions/

BibTeX citation

@misc{lruolin2021functions,
  author = {lruolin, },
  title = {pRactice corner: Functions},
  url = {https://lruolin.github.io/myBlog/posts/20210525_Tidyverse Chap 15 - Functions/},
  year = {2021}
}